Chapter 22

Comparing Survival Times

IN THIS CHAPTER

Using the log-rank test to compare two groups

Thinking about more complicated ways to compare the survival experience

Calculating the necessary sample size to compare survival times

The life table and Kaplan-Meier survival curves described in Chapter 21 are ideal for summarizing

and describing the time to the first or only occurrence of a particular event based on times observed in

a sample of individuals. They correctly incorporate data that reflect when an individual is observed

during the study but does not experience the event, which is called censored data. Animal and human

studies involving endpoints that occur on a short time-scale, like measurements taking during an

experimental surgical procedure, may yield totally uncensored data. However, the more common

situation is that during the observation period of studies, not all individuals experience the event, so

you usually have censored data on your hands.

In biological research and especially in clinical trials (discussed in Chapter 5), you often want to

compare survival times between two or more groups of individuals. In humans, this may have to do

with survival after cancer surgery. In animals, it may have to do with testing the toxicity of a potential

therapeutic. This chapter describes an important method for comparing survival curves between two

groups called the log-rank test, and explains how to calculate the sample size you need to have

sufficient statistical power for this test (see Chapter 3). The log-rank test can be extended to handle

three or more groups, but this discussion is beyond the scope of this book.

In this chapter, as in Chapters 21 and 23, we use the term survival in reference to the outcome

of death. However, all the calculations pertain to any type of outcome event being studied,

including good ones, such as cancer going into remission.

There is some ambiguity associated with the name log-rank test. It has also been called

different names (such as the Mantel-Cox test), and has been extended into variants such as the

Gehan-Breslow test. You may also observe that different software may calculates the log-rank

test slightly differently. In this chapter, we describe the most commonly used form of the log-rank

test.

If have no censored observations in your data, you can skip most of this chapter. This may